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On the maximum quartet distance between phylogenetic trees 


Noga Alon * Humberto Naves ^ Benny Sudakov ^ 


Abstract 

A conjecture of Bandelt and Dress states that the maximum quartet distance between any 
two phylogenetic trees on n leaves is at most (| +o(l))(2). Using the machinery of flag algebras 
we improve the currently known bounds regarding this conjecture, in particular we show that 
the maximum is at most (0.69 + o(l))(2). We also give further evidence that the conjecture is 
true by proving that the maximum distance between caterpillar trees is at most (| + o(l))( 4 )- 


1 Introduction 

The practice of phylogenetic tree reconstruction to hypothesize various aspects of evolutionary rela¬ 
tionships among different species of organisms has become a central problem in molecular biology. 
For instance, the “Tree of Life” project |15) aims, among other things, to accurately construct a tree 
representing the evolutionary history of the organismal lineages as they change through time. 

A phylogeny (the evolutionary history of a set of species) is usually represented by a tree where 
the species under study are mapped to the leaves of the tree and the tree-structure represents the 
different evolutionary relationships among them. Here we focus solely on undirected (or unrooted) 
phylogenetic trees. In this setting, the underlying tree is not directed and each non-leaf node is 
incident to exactly three edges. The basic unit of information for phylogenetic classification is the 
quartet, which is an undirected phylogenetic tree having exactly four leaves. We denote a quartet 
over the leaves {a, 6, c, d} as [a6|cd] whenever there is an edge in the underlying tree separating the 
pair {a, b} from the pair {c, d}, as Figure [1] shows. Note that a phylogenetic tree defined over a taxa 
(species) set of size n contains the information of exactly (2) quartets. 




Figure 1: A quartet. 


Studying quartets is of prime importance not only because they are the smallest informational 
units induced by a phylogeny, but also because they play a major role in many reconstruction 

*Sackler School of Mathematics and Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, 
Israel and School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540. Email: nogaa0tau.ac.il. 
Research supported in part by a USA-Israeli BSF grant, by an ISF grant, by the Israeli I-Core program and by the 
Oswald Veblen Fund. 

Tnstitute for Mathematics and its Applications, University of Minnesota, Minneapolis, MN 55455, USA. Email: 
hnaves@ima.umn.edu. This research was supported in part by the Institute for Mathematics and its Applications with 
funds provided by the National Science Foundation. 

^Department of Mathematics, ETH, 8092 Zurich. Email: benjamin. sudakovSmath.ethz. ch. Research supported 
in part by SNSF grant 200021-149111 and by a USA-Israeli BSF grant. 


1 




methods. Among them, the quartet-based reconstruction is perhaps the most basic and most studied 
approach (see e.g. [a El la da ESI [231125]). The task of the quartet-based reconstruction is to 
find a tree over the full set of species that satisfies most of the given input quartets. In its full 
generality this problem is very difficult as Steel |24j has shown that even deciding if there is a tree 
that satisfies all the input quartets is NP-complete. To aggravate matters, even the ideal case in 
which all quartets agree on a single tree is very rare. Thus a natural problem arises, namely, finding 
a tree maximizing the number of compatible quartets — maximum quartet compatibility (MQC) [21j . 
As MQC is obviously NP-hard, several approximation algorithms have been sugested. However, the 
best known approximation to the general problem is still obtained by a naive “random labelling of 
the leaves of a tree” with expected approximation ratio of 1/3. 

Related to the problem of compatibility is the concept of quartet distance [9] . This notion is used 
to measure similarity of two different phylogenetic trees by means of counting how many quartets 
are compatible to both of them. More specifically, if Ti and T 2 are two phylogenetic trees on n 
leaves, let qd{Ti,T 2 ) denote the difference between Q) ^i^cl the number of quartets compatible to 
both Ti and T 2 . With this definition in mind, a natural question emerges: what is the maximum 
quartet distance between two phylogenetic trees on n leaves? Somewhat surprisingly the answer is 
strictly smaller than (2). Bandelt and Dress showed that the maximum is always strictly smaller 
than ^ (2) for n > 6. They also conjectured that the ratio between the maximum quartet distance 
and (2) converges to | as n tends to infinity. 

Conjecture 1.1 (Bandelt and Dress). The maximum quartet distance between two phylogenetic 
trees on n leaves is (| -|- o(l)) (^). 

Alon, Snir, and Yuster [T] further improved the bounds on the maximum quartet distance. 
Namely, they proved that the maximum is always strictly larger than |(2) but asymptotically 
smaller than ^ (2). The lower bound of | (2) can be again obtained by the same “random labelling 
of the leaves” argument, thus Conjecture II.II implies that the average distance between two random 
trees is asymptotically the same as the maximum distance. We also remark that the problem of 
maximizing the quartet-distance between trees can be rephrased as how much a compatible set of 
quartets can be violated, which is the opposite of MQC. 

The main contribution of this paper is the following statement, which we obtain using the 
machinery of flag algebras developed by Razborov in |18j . 

Theorem 1.2. The maximum quartet distance between two phylogenetic trees on n leaves is at most 

( 0 . 69 + 0 ( 1 )) ( 2 ). 

As further evidence that is the correct answer, we prove the following statement which 

establishes Coniecture 11.11 when restricted to caterpillar trees. By caterpillar we mean a phylogenetic 
tree having at most two vertices which are each adjacent to two leaves, as in Figure [2j 



Figure 2: A caterpillar with n + 2 leaves. 

Theorem 1.3. The maximum quartet distance between two phylogenetic caterpillar trees on n leaves 
is at most (| + o(l)) (2). 

The set of all caterpillar trees is a simple yet very important subclass of phylogenetic trees. For 
instance, the proof of NP-hardness of MQC by Steel [24] heavily relies on this particular subclass. 
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Namely, deciding if there exists a tree T that satisfies all the quartets in a given input set is NP- 
complete even if we further assume that T is caterpillar. 

The rest of this paper is organized as follows. In Section [21 we formally define all the relevant 
notions in phylogenetics that were briefly mentioned in this introduction. In Section [31 we provide 
an informal explanation of our main tool, flag algebras. In Section jH we discuss some of the details 
of the proof of Theorem 11.21 and provide a link to the program establishing the proof. In addition, 
we give the proof of Theorem 11.31 in Section [5l Lastly, the final section contains some concluding 
remarks and open problems. 

2 Preliminaries 

A trivalent tree is a tree in which all internal vertices (the non-leaves) have exactly three neighbors. 
Whenever the leaves of such trees are labeled bijectively by a taxa (species) set X of size n, we shall 
call them phylogenetic trees. Throughout this paper, unless stated otherwise, all trees are assumed 
to be phylogenetic trees. For a tree T = (P, E), the set of leaves of T is denoted by C{T). 

The removal of an edge e in a phylogenetic tree splits it into two subtrees, and thus induces a 
split among the leaves of the tree. We identify an edge e by the split (U, C{T) \ U) it generates on 
the set of leaves, and denote this split by C/g- As external edges (the ones adjacent to the leaves) 
induce trivial splits, we consider only the ones induced by internal edges. 

Let T be a tree and A C C{T) a subset of the leaves of T. We denote by T|^, the topological 
subtree of T induced by A were all leaves in £(r) \ A and paths leading exclusively to them are 
removed, and subsequently internal vertices with degree two are contracted. 

For two trees T and T', we say that T satisfies T' (or , equivalently, that T' is satisfied by T), 
if C{T') C C{T) and ~ T', that is, the subgtree of T induced by C{T') is isomorphic to T'. 

Otherwise, T' is violated by T. Let T = {Ti, ... ,Tj.} be a set of trees with possibly overlapping 
leaves, and denote by C(T) = [Ji ^(^0) the union of the set of leaves of all trees Ti € T. Then for 
a tree T with C{T) = C(T), we denote by Ts{T) the set of trees in T that are satisfied by T. We 
say that T is compatible if there exists a tree T* over the set of leaves C{T) that satisfies every tree 
Ti € 7”, i.e. Ts(T*) = T (see Figure [3]). We denote by cofiT) the set of trees that satisfy T (up to 
isomorphisms), co(T) = {T : Ts{T) = T}. 



Figure 3: A tree on five leaves and two quartets compatible with it. 


Further, we say that T* is defined by T if co{T) is the singleton {T*}. If there is no such 
compatible tree T*, we say that T is incompatible (i.e., co{T) = 0). 

A quartet tree (or just a quartet for short), is a phylogenetic tree over four leaves {a, 6, c, d}. 
We denote a quartet over {a, 6, c, d} as [a6|cd] if there exists an edge e whose corresponding split 
satisfies a,b & U and c,d ^ U. Quartets are the most elementary informational unit in a phylogenetic 
tree, as a pair corresponds to a path in a tree and a triplet to a vertex (the unique vertex in the 
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intersection of all the pairwise paths connecting the three leaves). Every phylogenetic tree T with 
n leaves defines quartets, one for each set of four leaves. Let Q{T) denote this full quartet set 
of T. It is well-known that Q{T) uniquely defines T. In fact Colonius and Schulze [7] showed that 
the following proposition holds. 

Proposition 2.1 (Colonius and Schulze). Let Q be a full quartet set over n species. If every subset 
of three quartets (a quartet triplet) is compatible, then Q is compatible and there exists a unique tree 
defined by Q. In fact, if for every five taxa {a,b,c,d,e} the following holds: 

{[ae|6c], [ae|c(i]} n Q / 0 ^ {[ab\cd] € Q ^ [6e|cd] € Q), (1) 


then Q is consistent. 

Lastly, we would like to briefly sketch the “random labelling of the leaves” argument. Let T be 
any tree with n leaves labeled by a taxa set X. Consider a random bijection vr between X and the 
leaves of T. The corresponding labeled tree is denoted by As each of the re! possible bijections is 
equally likely, we notice that a quartet [a6|cd] with labels from X is satisfied by T'^ with probability 
exactly 1/3. Thus, by linearity of expectation, we have: 

Proposition 2.2. Let Q be an arbitrary set of quartets over a taxa set X of size re, and let be 
a random bijection between the leaves of a tree T and X. The expected number of elements in Q 
satisfied by T zs |Q|/3. 

As a consequence, we have the next statement. 

Proposition 2.3. Let Ti and T 2 be two random phylogenetic trees over the same taxa set X of 
size re, sampled independently and uniformly at random. The expected value of the quartet distance 
qdiTi,T2) zs|(^). 

3 Flag algebra calculus 

In this section we provide a brief introduction to the technique of flag algebras. First introduced by 
Razborov in m, it has been applied with great success to a wide variety of problems in extremal 
combinatorics (see, for example, [2l[3l|8l[ini[IIlll6l[I71[l9l|20] and many others). 

We begin with a brief explanation on how to map the problem of finding the maximum quartet 
distance into a problem in extremal combinatorics. We then proceed with a general overview of the 
flag algebra calculus in the second subsection, by introducing some key definitions and providing 
some intuition behind the machinery. The third subsection will show how we express extremal 
problems in the language of flag algebras. It is neither our goal to be rigorous nor thorough, but 
rather to emphasize that the combinatorial arguments behind the flag algebra calculus are as old as 
extremal combinatorics itself. Indeed, the main tools available to us are double-counting and the 
Cauchy-Schwarz inequality. 

The flag algebra calculus is powerful because it provides a formalism through which the combina¬ 
torial problem can be reduced to a semi-definite programming (SDP) problem. This in turn enables 
the use of computers to find solutions, with rigorous proofs, to problems in extremal combinatorics. 
For a more complete survey of the technique, we refer the reader to the excellent expositions in 
HH and m, while for a technical specification of flag algebras, we suggest the original paper of 
Razborov m- 
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3.1 The model 


In this section, the main object of interest is the tree-pair. From two phylogenetic trees Ti and T 2 
labelled by the same taxa set, we would like to create a simple object that “represents” the pair 
{Ti,T 2 ) in such a way that we can still compute the quartet distance qd{Ti,T 2 ) from it. Note that 
the actual set of labels (the taxa set) is irrelevant in the computation of this distance, so this object 
shall have no labels at all. A natural and amenable definition comes to mind. A tree-pair D is 
a pair of trivalent trees D = {Ti,T 2 ) (i.e., unlabelled phylogenetic trees) having the same set of 
leaves but having no other vertex in common. In that case we write C{D) := C(Ti) = C{T 2 ). From 
two phylogenetic trees T\ and T 2 over the same taxa set, one can construct a tree-pair D in the 
following way. We first identify leaves from Ti and T 2 having the same label and we subsequently 
remove labels from Ti and T 2 altogether to obtain Ti and T 2 , respectively. We often represent a 
tree-pair D = {Ti,T 2 ) by the graph Ti UT 2 which is the union of Ti and T 2 , that is, V{Ti[jT 2 ) = 
V{Ti)UV{T 2 ) and E{TiUT 2 ) = E{Ti) LI E{T 2 ), with Ti positioned “on top” of T 2 . (see Figure U]). 



Figure 4: Two trees over the same taxa set and the tree-pair formed by their union. 

Two tree-pairs D = {Ti,T 2 ) and D' = are isomorphic if there exists an isomorphism 

between the graphs Ti UT 2 and T;^ UT 2 that maps vertices of Ti to vertices of Tj for i = 1, 2, i.e., it 
respects the component trees. To indicate that two tree-pairs are isomorphic we write D ~ D'. For 
a tree-pair D = (T(, T^) and a subset A C C{D), we write D\a to denote the sub-tree-pair induced by 
A, that is, D\a '■= {T[ U,T'U). Recall that to obtain T[\a and T^Ia we keep the smallest subtrees of 
T( and T 2 (respectivelly) containing the leaves in A. We subsequently delete one by one all vertices 
of degree two by replacing the corresponding path of length 2 by an edge until all the vertices not in 
A have degree 3. For two tree-pairs D and D', if there exists a set A C C{D) such that D\a ~ D' , 
then we say that D contains an induced copy of D'. 

There are only 2 non-isomorphic tree-pairs having exactly four leaves, namely id/^ and cr 4 (see 
Figure [5]). 



Figure 5: The two non-isomorphic tree-pairs on 4 leaves. 


A moments thought reveals that the quartet distance qd{Ti,T 2 ) can be computed as follows. Let 
D = (Ti,T 2 ) be the tree-pair obtained by joining Ti and T 2 . Then qd{Ti,T 2 ) is simply the number 
of induced copies of cr 4 in D. Hence to prove Theorem 11.21 it suffices to show that the following is 
true. 
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Theorem 3.1. Let D be any tree-pair on n leaves. The number of induced copies of cr 4 ^ in D is at 
most (0.69 + 0 ( 1 )) (^). 

3.2 Definitions and notation in the calculus 

The flag algebra calculus is typically used to And the extremal density of some fixed structure in a 
given family of combinatorial objects. In our case (see Theorem EH), it will be used to maximize 
the density of cr 4 among all tree-pairs of size n, for n sufficiently large. While the theory of flag 
algebras is very general and can be applied to several different types of problems, we will explain it 
using only examples related to our particular setting. 

A type cr is a labeled tree-pair using labels from [k], where k = |T(cr)|. That is, each leaf in C{a) 
is associated with a label from [k], where k is a nonnegative integer. The size of a is the integer k, 
and is denoted by |(t|. Figure El shows some examples of types. 


dot: (I) diamond: (iJ @ star: 



Figure 6: Examples of types. 


In what follows, an isomorphism between tree-pairs must preserve any labels that are present. 
Given a type a, a a-flag is a tree-pair T on a partially labeled set of leaves, such that the sub-tree- 
pair induced by the labeled leaves is isomorphic to a. The underlying tree-pair of the flag F is the 
tree-pair F with all labels removed. The size of a flag is the number of leaves, that is, |T(T)|. Note 
that when a is the trivial type of size 0 (denoted by o" = 0), a u-flag is just a usual unlabeled tree- 
pair. We shall write Ff for the collection of all cr-flags of size I (up to isomorphism). In Figure [3 we 
list all flags in ^ 4 °^. Let = U/>|(t| When the type a is trivial, we shall omit the superscript 
from our notation. 



Figure 7: Family F^^^. 


Let us now define two fundamental concepts in our calculus, namely those of flag densities in 
larger flags and tree-pairs. Let cr be a type of size k, let m > 1 be an integer and let {Fj}™ ^ be a 
collection of cr-flags of sizes li = |F| > k. Given a cr-flag F of order at least I = k -\- ~ ^)i 

let A C C{F) be the set of labeled leaves of F. Now select disjoint subsets Xi C C{F) \ A of sizes 
\^i\ = k — k, uniformly at random. This is possible because F has at least Yliik — k) unlabeled leaves. 
Denote by Ei the event that the cr-flag induced by AuXi is isomorphic to Fj, for i € [m]. We define 
Po-(Fi, F 2 ,..., Fm'jF) := P [n™ ;^Fj] to be the probability that all these events occur simultaneously. 

If D is just a tree-pair of order at least I, and not a cr-flag, then there is no pre-labeled set of leaves 
A that induces the type cr. Instead, we uniformly at random select a partial labeling F : [/c] —r C{D). 
This random labeling turns D into a cr'-flag F^, where the type a' is the labeled sub-tree-pair induced 
by the set of vertices L{[k\). If cr' = cr, we can then proceed as above, otherwise we say the events 
Ei have probability 0. Finally, we average over all possible random labellings. Formally, let Y be 
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the following random variable 


Y : = 


Pa(Fi,F 2 ,... ,Fm;FL) if a'= a 
0 otherwise 


Define da{Fi,..., F^', D) := E[y] as the expected value of the random variable Y. The quantities 
Pa{Fi, F 2 ,..., Fm] F) and da{Fi, F 2 ,..., Fm] D) are called flag densities of {T’i}ig[m] P' ^.nd in D, 
respectively. Clearly these flag densities are the same whenever cj = 0, in which case we omit the 
subscript from both notations. 

To better illustrate these definitions, we give some examples. These flags are shown in Figure [H 



Figure 8 : Example flags. 


We turn to compute the flag densities of id\ and cr\ in the flag Z. For example, to compute 
Pdot{id\] Z), note that to induce a copy of id\ we must choose exactly 3 other unlabeled leaves which 
together with the labeled leaf 1 induce a copy of id\. There are ( 3 ) = 20 ways to make the choice 
of 3 unlabeled leaves, and out of the 20 exactly 15 induce a copy of id\, thus pdot{'ld\] Z) = |. 
Similarly we obtain Pdot{cr\', Z) = j. We can also compute the joint flag densities of multiple flags. 
For instance, let us consider pdot{id\,id\-, Z). In this case, we first choose a set Xi of 3 unlabeled 
leaves uniformly at random and we subsequently choose a set X 2 of 3 other unlabeled leaves also 
uniformly at random. Since the choice of X 2 is uniquely determined given the choice of Xi, there 
are exactly ( 3 ) possible choices for the pair {Xi,X 2 ). Out of these 20 choices, one can count that 
exactly 10 of them will be such that both Xi and X 2 will induce a copy of id\ when we add 
the labeled leaf. The order matters here: when computing p{Fi, F 2 ] F), the set Xi must induce 
a copy of Fi while X 2 must induce a copy of F 2 . Thus pdot{id\,id\\ Z) = Similarly, we have 
Pdotiidl, crl] Z) = Pdoticrl,idl; Z) = j and Pdoticrj, crl; Z) = 0. 

The computation of flag densities ddot for unlabeled tree-pairs is a little more involved. To see 
how to compute it, we consider W depicted in Figure| 8 ]as an example. There are two non-isomorphic 
dot-flags whose underlying tree-pair is W, namely and W 2 as shown in Figure [9j 


Wl: (1 


Figure 9: dot-flags for W. 

If we randomly label a leaf from W, then with probability | it will become VF/ and with 
probability ^ it will become W^- Moreover, since p{id\;Wi) = | and Pdot{id\',W 2 ) = |, we have 
ddot{idl;W) = ^Pdot{'idl]Wl) + ^PdotiidliW^) = Similarly we have Pdoticrl;Wl) = i and 
Pdot{crl-,W^) = |, hence ddot {crl, W) = 

The reader might notice that there is an alternative way to compute, say, ddot{crl', W): we simply 
compute the product of ddot{crl; cr 4 ) -plcr^-, W) = 1' ^ fo general, suppose as before that we 

have a type a of size k, a o'-flag F of size I > k, and an unlabeled tree-pair D. To compute do{F] D), 
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we averaged over all random partial labelings of D the probability of finding a flag isomorphic to 
F. A simple double-counting argument shows that we can do the “averaging” before the random 
labeling, which is the idea behind Razborov’s averaging operator^ as defined in Section 2.2 of |18| . 
Let F\q denote the unlabeled underlying model of F. We can compute da{F] D) by first computing 
d{F\Q]D), the probability that I randomly chosen vertices in D form an induced copy of F\q as a 
sub-model. Given this copy of i^|o, we then randomly label k of the I vertices, and compute the 
probability that these k vertices are label-isomorphic to a. This amounts to multiplying d{F\Q‘,D) 
by a normalizing factor QaiF), that is, da{F]D) = q(^{F)d{F\Q; D) = q(^{F)p{F\Q; D). We can 
interpret the normalizing factor as qa{F) = do-(T;T|o). 

There are more relations involving d^ and po- than the one mentioned previously. We will now 
state, without proof, a basic fact about flag densities that can be proved easily by double counting. 

Proposition 3.2 (Chain rule). If a is a type of size k, m>l is an integer, and {Fi}^^ is a family 
of a-flags of sizes \Fi\ = f, and I > k — k) is an integer parameter, then 

1. For any a-flag F of order at least I, we have 

Pa{Fi,... ,Fm]F) = ^ p„{Fi,... ,Fra-,F')p„{F';F). 

F'gj-f 

2. For any tree-pair D of size at least I, we have 

da{Fi,... ,Fm-,D) = ^ da{Fi,... ,Fm-,H)d{H-D) = ^ p„{Fi,... ,Fm\F)da{F;D). 

H&Ti F&Tf 

To illustrate the chain rule for m = 1 and ct = 0, we consider the “expansion” of id^ in J -5 (see 
Figure [To]). 



Figure 10: Family J- 5 . 


The chain rule gives 

p{idi; F) = piidi] idf)p{id 5 -, F) p{idi] cr^)p{cr^-, F)+ 

+ p{id4; crf)p{crf; F) + p{id^-, cr^)p{cr^ ; F) 

= Piid 5 -, F) + ^p(cr^; F) + ^p(crf; F). 

Similarly, we can expand p{cr4\F) = ^p{cr^;F) -\- ^p{cr^]F) p{cr^-, F). 

For the ease of notation, we can express these two identities using the syntax of flag algebras: 

3 ^ 1 B 

^a4 = ^a5 + -erg -|- -erg 

t) t) 

2^40 

cr4 = -cr^ -\- -cr^ -h erf. 

In this syntax, the equation Xlig/ = 0 means that for all sufficiently large cr-flags F, we have 
aiP(j{Fi] F) = 0, where Oj G M for all i & I. We use to denote the set of linear combinations 
of flags of type a. It is convenient to define a product of flags in the following way: 

Fi-F 2:= e.F‘^,F2G.F^^> |Fi| + |F2|-|a|. 










(Note that because of the chain rule, it does not matter which I we choose.) 

To further simplify the notation, we can extend the definitions of and d„ to A‘^ by making 
them linear in each coordinate. The product notation simplifies these extended definitions, because 
Paifi ■ /2;/) = Pa{fij 2 ;f) and daifi ' f 2 ]g) = (/i,/2i ff), for any /i,/2,/ G and for any 
g£A^. 

The last piece of notation we introduce is that of the averaging operator. Recall that for any 
cr-flag F, we had the normalizing factors qa{F) such that da{F] G) = go-(T’)p(T|o; G). In the syntax 
of flag algebra, this averaging operation is denoted by [[T]]o- := ga{F) ' -^|o- We extend this linearly 
to all elements of . For example 

2 

[[idl]]dot = idi, [[crl]]dot = cr^, [[idl + crl]]dot = id^ + cr^, and [[Wl]]dot = 

This notation is useful, because dfj{f', g) = p([[/]]o-i 5) for any / G A^ and for any g G AP, and hence 
we have a unihed notation for both types of flag densities. 


3.3 Extremal problems in the flag algebra calculus 

Recall that our optimization problem is to maximize the density of cr4 amongst all possible tree- 
pairs. We will show how flag algebras can be applied to this problem to reduce it to a semi-dehnite 
programming (SDP) problem, which can then be solved numerically. 

We may use the chain rule to obtain d{cr4',D) = d{cr4; F[)d{F[; D) for t > 4. Since 

d{H;D) = 1, we have 

d{cr4] D) < ™ax d{cr4; H), 

which is a bound that clearly does not depend on D. For instance, when we choose t = 6 we already 
obtain d{cr4] D) < 

Inequalities obtained this way are often very weak, since we only use very local considerations 
about the sub-tree-pairs H ^ Ft, and we do not take into account how the tree-pairs fit together in 
the larger tree-pair D; that is, how they intersect. 

One might hope to find inequalities of the form that when we 

combine them with the initial identity, we get 


d{cr4-,D)<d{cr4-,D)+ aHd{H;D)= {d{cr4; H) -|- aH)d{H ; D) 

H&Tt H&Tt 


< m&y. {d{cr 4; H) Fan}- 

H£j^t 


Since an can be negative for some models FI, the hope is that this will improve the low coefficients 
by transferring weight from high coefficients. In order to find such inequalities, we need another 
property of the flag densities. 


Proposition 3.3. If a is a fixed type of size k, m > 1 is an integer, {Fi}'fl^ is a fixed family of 
a-flags of sizes \Fi\ = It, and I > k + ~ integer, then for any flag F of order n > I, 

we have 


m 


Pa{Fi,. . .,Fm;F) 


llpAFi-,F) 


+ 0(l/n). 


where the constant in the hig-0 notation might depend on the family 


One can prove Proposition 13.31 by noting that, if we drop the requirement that the sets Xt are 
disjoint in the dehnition of Pa{Fi,..., F^', F), the events Ei will become independent, and thus 

= \YiLiP(j{Fi',E). The error introduced is the probability that these sets 
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Xi will intersect in F, which is 0(l/n). It is tempting to claim a similar product formula for the 
unlabeled flag densities do-, but we cannot do so. In the above equation, it is essential that all the 
u-flags Fi share the same labeled type cr, and hence we require F to be a cr-flag. 

We are now ready to establish some inequalities. Let’s first fix a type a of size k. If Q is any 
positive semi-definite \Xi\ x matrix with rows and columns indexed by the same set Fi , where 
I > k, define the “quadratic form” on flags by 

Q{J^n ■■= ■ F2 G AF 

Fi,F2&Ti 

Proposition 13.31 yields, for a cr-flag F of sufficiently large size, the following approximation 


^ QF,,F2Pa{Fr,F)p^{F2-,F). (2) 

F\,F 2 &Ff 

Note that because Q is positive semi-definite, the right hand side of ([2]) is always non-negative. Even 
after averaging we obtain: 


mUD) :=p{[[Q{Xn]UD) = QF„F2dAFi,F2;D) 

F\,F 2 &F? 


Qfi,F2 ( ^ Pa{Fi,F2]F)da{F;D) 


Fi,F 2 (iT? 


K F^T? 


E E QFi,F 2 PcriFi, F2-, F) j dcr{F;D) 

FeJry \Fi,F2 &F^ 


E E QFFF2PaiFi-,F)p^{F2;F)^ daiF;D) + 0{l/n)>On^oo{l), 

FeJ7 \Fi,F2eJ'r 


where n is the size of the tree-pair D and 21 — \a\ < t < re is some fixed integer. Therefore, when 
re is large, we have that [[g]]f7(L)) is asymptotically non-negative. For each admissible model H of 
size exactly t, let an = [[Q]]a{F[) = F2GJ7 QFi,F 2 da{Fi, F 2 ] H). We then have 

MUD) = Y, andiH-D) > o™(l). 

HeFt 


The expression in the middle of the above equation is called the expansion of [[g]]fT(L)) in tree-pairs 
of size t, with an the coefficients of the expansion. For the sake of conciseness, we often omit the 
parameter D and express this asymptotic inequality (combined with the expansion in size t) in the 
syntax of flag algebras 


[[Q]]. := [[Q{Fnu 


E Qfi,F2Fi ■ F 2 

- Fi,F 2 &Ff 


'y ^ oif[H > 0. 

HdTt 


(3) 


(Note that all inequalities between flags stated in the language of flag algebras are asymptotic.) 

In general, if we have more than one inequality available, we can combine them together, provided 
they are all expanded in the same size t. Suppose we have r inequalities given by the positive semi- 
definite matrices Qi of the cjj-flags of size k. Adding them together, we obtain 

r 

^ E 

i=l HGFt 
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where 


an = E E {Qi)Fi,F 2 daiiFi, F 2 -, H) , 

*=1 VFi , F26^,7 / 

and we want to minimize maxHeFt {d(cr4; H) + an}- 

Thus we have transformed the original problem of finding a minimum upper bound for d(cr4; G) 
into a linear system involving the variables {Qi)Fk,Fi- As we have the constraint that the matrices Qi 
should be positive semi-definite, this is a semi-dehnite programming problem. To take the maximum 
coefficient in the expansion, we introduce an artihcial variable y, and require it to be bounded below 
by all the coefficients. Hence we have the following SDP problem in the variables y and {Qi)Fi,F 2 - 

Minimize y, subject to the constraints: 

• We have s/f > 0 for all H ^ Tt, where 


. / 

SH ■= y - d{cr 4 -,H )^ {Qi)Fi,F2dai{Fi,F2;H) 

*=1 \Fi,F2eF^^ 

The variables sh are called surplus variables. 

• Qi is positive semi-definite for i G [r]. (The matrices Qi are often called the block variables 
of the SDP problem. We can assume without loss of generality that each Qi is symmetric, as 
otherwise we could replace Qi by (Qi + QJ)/2.) 

A computer can solve this SDP problem numerically, allowing for an efficient determination of 
the inequalities required to prove the extremal problem. We note at this point, that the solution to 
the SDP problem need not only give the asymptotic bound, but can also provide some structural 
information about the extremal tree-pair. 



4 Main result 

In this section we discuss some practical considerations of the main theorem. For a square matrix A, 
let tr(A) denote its trace. The original formulation of the SDP problem can be rewritten in concise 
matrix notation as follows: 

minimize tr(C' • Q) 

subject to ti(Aj ■ Q) = bj, for all j = 1,..., m, (5) 

and Q F 0 (i-e., Q is positive semi-dehnite) 

where m = \Ft\ represents the number of constraints in the problem, C is the cost matrix (we 
have tr(C' • A) = y, where y is as in the previous subsection), Q is positive semi-dehnite matrix 
consisting of all the block-variable matrices Qi, and each equation tx(Aj ■ Q) = bj corresponds to 
one of the equations ([4]) from the original formulation. In particular, if A) = {Hi ,..., Hm}, we have 
bj = d(cr 4 ‘, Hj). Finally, we let (. denote the number of rows/columns of Q. 

A computer usually cannot solve ([5]) exactly, but only approximately. In other words, the output 

of the SDP solver will be a matrix Q' that satishes the constraints approximately. Namely, we have 

|tr(A,'• Q') — 6,1 <e, for j = 1 ,... m, and 

Q' + u, to. 
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for some small e > 0 (usually e < 10“®), where Ii denotes the £ x i identity matrix. In what follows 
we describe how to obtain a matrix Q that satisfies all the constraints of ([5]) and is not “too far” 
from the approximate solution Q'. That way tr(C • Q) ~ tr(C' ■ Q'). 

A natural first step towards this goal is to slightly change Q' so that it satisfies all the linear 
constraints in ([5]). For that purpose, we will project Q' to the affine subspace of all £ x £ symmetric 
real matrices Q that satisfy t^{Aj ■ Q) = bj for all j = 1,..., m. Let Q" denote this projection. How 
much did we change the approximate solution? Namely, how large is ||<5^ — We recall that 

for a matrix A, we denote ||H.||oo := max^ \Aij\ and ||H.||i := \Aij\. 

To estimate \\Q' — Q'^Hoo we often use some inequalities from the following proposition. 

Proposition 4.1. The following statements are true: 

(i) If A ^ are two real matrices, then ||H-il||oo < n ■ ||^||oo • ||H||oo- 

(ii) If A ^ ^mxn V then ||H-u||cxd < ||^||cxd-||i'||i- 

(in) Let A € be any nxn symmetric matrix. Then A + n- ||H.||oo -In is positive semi-definite. 

Proof. (i) Let C = A ■ B. We have Cij = ^ikBkji thus 

n n 

\Cij\ ^ ^ 11^1 |cxD I |H| loo = n ■ ||H.||oo • ||H||oo, 

k=l k=l 


hence HCHoo < n • ||T||oo • ||H||oo. 

(ii) Let w = A - V. We have Wi = Y]=i 

n n 

I I — ^ ^ I Aij 11 'Cj I ^ ^ ^ 11 ^ 11 oo I I — 11 ^ 11 oo ■ 11 111) 

i=i i=i 

therefore ||tc||oo < Halloo • 

(iii) Let H H + n • ||H||oo • In- It suffices to show that for all v € we have ■ B ■ v > 0. Let 

a = ■ A ■ V. Using the definition of B, we obtain 

b := v"’" ■ B ■ V = v"’" ■ A ■ V -\- n ■ \ \A\\oo ■ \ \v\\2 = a -\- n ■ \ \A\\oo ■ \ \v\\2- 

By (ii) applied twice, we infer 

|a| = ||u'^ • A • u||oo < llw^lli • 11^ • -ylloo < Halloo • Ill'll? 

So by Cauchy-Schwarz inequality, we obtain |a| < ||^||cxd • H'i'll? < n ' Halloo • H'^'lli) therefore 
6 > 0, finishing the proof. 

□ 

In what follows, we introduce further notation in order to express Q” in terms of Q' and the 
parameters of the problem jS]). Let S be the linear space of all £xi real symmetric matrices, and let 
A be the linear map A : 5 —)• M™’ defined by A{Q)j = ti{Aj ■ Q). In addition, let b G R™ be vector 
with coordinates bj for j = 1 ,... ,m, and let B be the affine subspace of all £ x £ real symmetric 
matrices Q that satisfy the linear constraints of ([5]), namely tr(Aj • Q) = bj for j = 1,... ,m. Note 
that B is the pre-image of b by A. Let P be the orthogonal projection from the set S to the affine 
subspace B. One can compute this projection by a solution to a least squares problem as follows: 

P(Q) = Q + A^ • (A • A^)-1(6 - A(g)), 
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for all Q G S, where : M”* — S denotes the transpose of A. We have Q" = F{Q'), thus by 
Proposition 14.11 fil. we have 

HQ" - Qlloo < m • ||A^ • (A • A^)-i||oo - life - A(Q0||oo < em • ||A^ • (A • A^)-^||oo. 

For all the instances of (j5]) that we consider, one can verify that ||A^ • (A • A^) ^lloo < 1, thus 
e' := WQ” — QHloo < This inequality together with Proposition 14.11 (iil implies that 

tr(C • Q") = tr(C • Q') + tr(C • (Q" - Q')) < tr(C ■ Q') + em ■ \\C\\i. 

We know that Q” satisfies all the linear constraints of Q, but Q” might not be positive semi- 
definite. An application of Proposition 14.11 (hi) yields that 

(Q"-Q')+^-||Q"-Q'I|oo-/£^0, 

which, together with the inequality Q' + el^ ^ 0 from ([6]), implies that Q" + {e + £e')-l£ P 0. To make 
Q" positive semi-definite, we hope to find a matrix Q that satisfies tr(Aj • Q) = 0 for j = 1,..., m 
and such that all the eigenvalues of Q are large. If such Q exists then Q” -|- 5Q will be positive 
semi-definite for some small <5 > 0 and will also satisfy the linear constraints in ([5]). For this reason, 
we consider the following problem: 

minimize tr(0 • Q) 

subject to tx{Aj ■ Q) = 0, for all j = 1 ,..., m, (7) 

and Q 0 (i.e., Q is strictly positive-definite) 

Note that the function being minimized is the constant zero function, so ([7]) is a pure feasibility 
problem. We again use computers to obtain an approximate solution Q' to ([7]). Surprisingly, it 
turns out that the obtained solution Q' not only satisfies |tr(Aj • Q^l < e for all j = 1,... ,m, but 
also has a large smallest eigenvalue (much larger than £ + ££'), even though |tr(C' • Q')\ is relatively 
small. We will later exploit these properties to adjust Q” to an exact solution of ([5]). 

Using similar ideas as before, we obtain a matrix Q" that satisfies tr(Aj • Q”) = 0 for all 

j = l,...,m by means of orthogonal projection of Q' to the appropriate subspace. As we have 

already seen, this operation only slightly changes the eigenvalues of Q'. Finally we let Q := Q” + SiQ'', 
where 5 = and A is the smallest eigenvalue of Q” (in all of our instances we have 5 < 10“^). 
Clearly Q satisfy all the constraints of ([5|), including Q ^ 0. Moreover, we have 

tr(C • Q) = tr(C ■ Q") + <5 • tr(C • Q") < tr(C • Q') + em ■ ||C||i + 5 • tr(C ■ Q"), 

and since both em and 5 are typically small, we will not change the objective value much from the 
original approximate solution Q' to the exact solution Q. Therefore Q is the desired exact solution 
which is “close” to Q^ 

In what follows we have a compiled table displaying the several bounds obtained for different 
instances of the SDP problem. The first column represents the parameter t, which is the size of 
the tree-pairs used in the expansion of the problem (see Section [3.31 for more details). The second 
column counts the number of tree-pairs of size t. The third column indicates how many types where 
used, that is, the types a for the inequalities of the form ([3]). The used types are all those having size 
with the same parity as and strictly smaller than t. The fourth column contains the total number 
of variables in the SDP instance including the surpluses. Finally, the last column tells the bound 
obtained from the SDP solver. The program used to generate the SDP instances and verify these 
calculations can be downloaded at http://http://http://arxiv.org/src/1505.04344v2/anc, 

Remark. It was still possible to run the program for t = 9, but unfortunately the bound was 
still strictly greater than 2/3. 
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t 

m = IJ)! 

# of types used 

1 of variables 

Bound 

5 

4 

1 

50 

0.884766 

6 

31 

3 

697 

0.760257 

7 

243 

6 

12050 

0.707633 

8 

3532 

35 

506171 

0.688397 


Table 1: Several instances of the main SDP problem. 


5 On caterpillar trees 

In this section, we prove Theorem 11.31 — a restricted version of Conjecture 11.11 to caterpillar trees. 
One possible approach to this problem is to use the same machinery of flag algebras for the theory 
of tree-pairs restricted to caterpillar trees, and try to obtain a bound in the same way as we did 
for Theorem 11.21 However, this approach does not immediately yield the bound of |, and so it is 
necessary (and worthwhile) to think about this problem from a different perspective. In the next 
few paragraphs we will explain how to map the problem of computing the induced density of cr 4 in 
a tree-pair of caterpillar trees into a problem of counting induced sub-permutations of size 4. 

Suppose D = {Ti,r 2 } is a tree-pair composed by two caterpillar trees on n -|- 2 leaves (as 
exemplified in Figure ED- One can think of D as a permutation of {a, xi,..., Xn, /?} — a permutation 
that tells us exactly how the leaves of Ti are “attached” to the leaves of T 2 . For instance, the tree- 
pair cr^ in Figure [To] could be represented by the permutation a —>■ a,xi xi,X 2 —t X 3 ,X 3 — 
X 2,/3 —t /?■ In fact, multiple permutations might give rise to the same tree-pair. Regarding this 
matter, our first observation is that any caterpillar tree on four or more leaves has exactly 8 distinct 
automorphisms. To illustrate this observation, consider the caterpillar tree on n -|- 2 leaves labelled 
by a, xi,..., Xn, /? as depicted in Figure [TT] One of the automorphisms of this tree is ai, which is the 
unique automorphism that maps a to /3 and j3 back to a, such as in a “reflection”. Similarly, CJ 2 is 
the automorphism that only swaps a with xi and leaves all the remaining vertices in place. Finally, 
1 T 3 is the automorphism that swaps /? and Xn- The group of automorphisms can be then written 
as 0 — *i 02 )f 3 < !}• Our second observation is that given a permutation tt, and two 

automorphisms cr, a' of the caterpillar tree with leaves labelled {a, xi,..., x„, /?}, the permutation 
crvrcr' represents the same tree-pair as vr itself. Here we think of a and a' as only acting solely on 
the leaves of the caterpillar trees. 


0-1 



Figure 11: The automorphisms of a caterpillar tree. 


Given a permutation vr of L := {a, Xi,..., x„, /3}, how do we count the number of induced copies 
of cr 4 in the corresponding tree-pair D represented by vr? Suppose S' C L is a subset of size 4 
such that a,/3 ^ S and a,j3 0 7 r(S), say S = {xi-^,Xi 2 ,Xi^,Xi^} with 7 r(xjj) = Xj^ for t = 1,...,4 
and zi < ^2 < *3 < * 4 . Here it is helpful to think of S as a subset of the leaves of Ti before the 
identification with the leaves of T 2 . The corresponding leaves selected by S will induce a copy of id^ 
in D if either max{ji,j 2 } < or rnax{j 3 ,j 4 } < min{ji,j 2 }. Otherwise S induces a copy 

of 0 x 4 . Since there are only O(n^) subsets S that do not satisfy the condition {a, /3}n (SU 7 r(S)) = 0, 
the problem of computing the density of id^ in D essentially becomes the problem of computing the 
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density of the following induced sub-permutations in a permutation vr G Sn- 


1234,1243,2134,2143,3412,4312,3421,4321. (8) 

The machinery of flag algebras is very general and thus can also be applied to the theory of 
permutations. In fact, we have the following theorem which implies Theorem 11.31 

Theorem 5.1. The sum of the densities of the permutations listed in ([8|) inside a permutation 
TT £ Sn is at least | -|- o(l) for n large. 

Proof. Using the notation from flag algebras, let cf denote the sum of the densities of the permutations 
in ([8]), that is, 

cf = 1234 1243 -h 2134 2143 -h 3412 4312 3421 -h 4321. 

In this notation, a flag is just a permutation with some entries labeled by the set [k] for some A; > 0. 
For instance denotes a flag whose underlying permutation is 164235 for which the fifth 

entry is labeled 1 and the second entry is labeled 2. In this case, the type of the flag is since 

the sub-permutation induced by the labeled entries is isomorphic to 21 (corresponding to the entries 
63). Another example is the flag (^2l7(^^3(^2‘^ — its underlying permutation is 5172364 and its 
type is With this definitions in mind, we remark that a type is just a permutation of 

[fc] with all entries labeled by elements of the set [k]. Thus, for an integer k >0, types on k entries 
are in one-to-one correspondence with pairs of permutations of [A:]. 

Consider the following 4 types of size 2 

^>1 = ®1@2. = @,®2, 
n = ® 2 ®i. P‘ = ® 2 ®r 

^ = 5 + E 3 ■ IK^’. - + 6 ■ [[(Vi - Z.)"ll„ (9) 

i=l 

Vl = -1 @i@ 24 - 1@,@23 + + 3@,®2l. 

r, = +I@i3@, + l@i4(3), - 4@,l(3), - 3(2),1@,. 

Z, =+@,l@j4 + @,l@j3-@,4@jl 
V2 = -4®,©,! - 4@,(l)j2 + l@,@j4 + 2©,®,4, 
r2 = +4©,2®, + 4©,1©, - 1©,4©, - 2©,4®j. 

Zj = +©,4@,1 + ©i4®,2 - ©,1@,4 - @,2®,4, 

Vs = -1©2©,4 - 2®,©,4 + 4©,©,1 + 4®j©,2, 
n = +®j2©,4 + ©jl©,4 - @j4©,l - ®,4©,2, 

Zs = +I@j4©, + 2®,4©, - 4@,1©, - 4®j2©,. 

V4 = -4@j@,l - 3(5,©,1 + l©,@i4 + 1©@,3, 
n = +®23©,1 + ©24©,! - ©2l©4 - @,1©,3, 

Z4 = +4©,1@, + 3@,1©, - 1©,4©, - 1@,3@,, 

therefore <^ > |, thereby proving the theorem. Note that in order to attest the correctness of ([9]), 
it suffices to evaluate the left- and the right-hand side of the equation for all permutations of size 
6 . □ 


We have 


where 
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6 Concluding remarks 


In Theorem 11.21 we showed that the maximum quartet distance between two arbitrary phylogenetic 
trees on n leaves is at most (0.69 + o(l))(2)- It would be interesting to know if the techniques of this 
paper can be pushed even further to obtain the (| + o(l))(2) thereby establishing Conjecture 11.11 
Another approach to Conjecture 11.11 is to solve an extremal problem in the theory of 4-uniform 
hypergraphs. In |T], Alon et al proved the asymptotic upper bound of ^(2) by mapping a tree-pair 
into a 4-uniform hypergraph in the following way. The vertices of the hypergraph are the leaves of 
the tree-pair and a subset S' of 4 leaves is an edge of the hypergraph if the sub-tree-pair induced by 
S' is isomorphic to cr 4 . They showed that the resulting hypergraph does not contain a copy of 
Kq — the complete 4-uniform hypergraph on 6 vertices. One remark is that not only Kq but also 
several other forbidden hypergraphs do not appear as induced subgraphs of A natural question 
emerges: can one characterize this family of forbidden subgraphs? In particular, is it finite? 
Acknowledgement We thank Sagi Snir and Raphy Yuster for helpful discussions. 
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